Word Clouds with Latent Variable Analysis for Visual Comparison of Documents
نویسندگان
چکیده
Word cloud is a visualization form for text that is recognized for its aesthetic, social, and analytical values. Here, we are concerned with deepening its analytical value for visual comparison of documents. To aid comparative analysis of two or more documents, users need to be able to perceive similarities and differences among documents through their word clouds. However, as we are dealing with text, approaches that treat words independently may impede accurate discernment of similarities among word clouds containing different words of related meanings. We therefore motivate the principle of displaying related words in a coherent manner, and propose to realize it through modeling the latent aspects of words. Our WORD FLOCK solution brings together latent variable analysis for embedding and aspect modeling, and calibrated layout algorithm within a synchronized word cloud generation framework. We present the quantitative and qualitative results on real-life text corpora, showcasing how the word clouds are useful in preserving the information content of documents so as to allow more accurate visual comparison of documents.
منابع مشابه
Semantic-Preserving Word Clouds by Seam Carving
Abstract Word clouds are proliferating on the Internet and have received much attention in visual analytics. Although word clouds can help users understand the major content of a collection of documents quickly, their ability to visually compare documents is limited. This paper introduces a new method to create semantic-preserving word clouds by leveraging tailored seam carving, a well-establis...
متن کاملSpatial Latent Dirichlet Allocation
In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a “bag-of-words”. It is also critical to properly design “words” an...
متن کاملWordWanderer: A Navigational Approach to Text Visualisation
Text visualisations provide visual representations of documents or small corpora with the primary aim of supporting language analysis. We are interested in developing a more playful approach to language that can be characterised by the notion of wandering as an open-ended movement. To support such a casual form of engagement with text, we designed the WordWanderer system: a visualisation techni...
متن کاملSemantic Wordification of Document Collections
Word clouds have become one of the most widely accepted visual resources for document analysis and visualization, motivating the development of several methods for building layouts of keywords extracted from textual data. Existing methods are effective to demonstrate content, but are not capable of preserving semantic relationships among keywords while still linking the word cloud to the underl...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016